Data Analysis

Collaborative Coding 1

Jafet Belmont

School of Mathematics and Statistics

ILOs

By the end of this session you will be able to:

  • Understand version control and why it is useful.

  • Setup your Git/GitHub account.

  • Create a local repository and commit changes to it using GitHub Desktop.

  • Publish your local repo to GitHub

Introduction to Version Control & Git

What is a version control system?

Version control is used throughout industry and academia as a way of developing and sharing code.

  • Is a tool that helps keep track of code and software development projects as they change over time

  • The idea is to save snapshots of code at any given time, called versions/commits.

What is a version control system?

Version control is used throughout industry and academia as a way of developing and sharing code.

  • Is a tool that helps keep track of code and software development projects as they change over time

  • The idea is to save snapshots of code at any given time, called versions/commits.

  • When coding collaboratively,researchers or developers can work on the same code base, keeping individual versions of the code through branches.

What is Git

  • Git is a version control system invented by Linus Torvalds.

  • Torvalds first commit message was as follows:

    Initial revision of "git", the information manager from hell
  • And an interesting README

    GIT - the stupid content tracker
    
    "git" can mean anything, depending on your mood.
    
     - random three-letter combination that is pronounceable, and not
       actually used by any common UNIX command.  The fact that it is a
       mispronunciation of "get" may or may not be relevant.
     - stupid. contemptible and despicable. simple. Take your pick from the
       dictionary of slang.
     - "global information tracker": you're in a good mood, and it actually
       works for you. Angels sing, and a light suddenly fills the room.
     - "goddamn idiotic truckload of sh*t": when it breaks
    
    This is a stupid (but extremely fast) directory content manager.  It
    doesn't do a whole lot, but what it _does_ do is track directory
    contents efficiently.

Introduction to Git

It’s a system that can be run offline by itself, but its commonly used through a hosting service such as GitHub

  • Git is a command-line program, however, you can opt to install a graphical user interface (GUI) for Git such as GitHub Desktop.

  • In this session we will focus on using Git through GitHub Desktop, but see the notes if you are interested in running it through the command-line.

  • lets begin by signing up for GitHub

Sign up for GitHub

  1. You can sign up for GitHub for free

Sign up for GitHub

  1. You can sign up for GitHub for free
  2. You can also download the desktop for free
    • There are other GUIs that you can use as well (e.g., Sourcetree), here we will cover GitHub Desktop (desktop.github.com).

Sign up for GitHub

  1. You can sign up for GitHub for free
  2. You can also download the desktop for free
  3. Set your Git Name and Email address you used in step 1.

Sign up for GitHub

  1. You can sign up for GitHub for free
  2. You can also download the desktop for free
  3. Set your Git Name and Email address you used in step 1.
  4. Connect to GitHub

Why do I need this?

  • The primary reason for version control is to be able to track changes and if needs go back to an older version

Imagine you want to debug a code or test a new feature

Why do I need this?

  • The primary reason for version control is to be able to track changes and if needs go back to an older version

Imagine you want to debug a code or test a new feature

  • Version control allows you to implement it and then only choose whether you integrate it with your main code once you are sure it works

Why do I need this?

  • The primary reason for version control is to be able to track changes and if needs go back to an older version

Imagine you want to debug a code or test a new feature

  • Version control allows you to implement it and then only choose whether you integrate it with your main code once you are sure it works
  • It can also allow you to have multiple versions:
    • E.g., one user could be using your released code while another develops a version which temporarily stops it working.

Take home

In summary

  • It allows you to develop temporary and multiple versions of your code

  • Share your code effectively with others (during or after completion)

  • Others can then use your code and you can use other people’s code.

  • Reproducibility

You cannot claim your method works if you don’t release it in a usable state!

Git Workflow

Process Outline

There are two basic workflows for creating a repository and linking it to GitHub

  1. Create a local repo and add (push) it to GitHub (Section 8.2 notes)

Process Outline

There are two basic workflows for creating a repo and linking it to GitHub

  1. Create a repo on GitHub and cloning it (see Week Tasks)

Process Outline

There are two basic workflows for creating a repo and linking it to GitHub

  1. Create a repo on GitHub and cloning it (see Week Tasks)

Regardless of the approach, the end goal is to work collaboratively with others (we will get to this later).

Basic workflow for a single user

Creating a local repo

  1. Create a Git repository in your local machine by setting the Local Path field to your preferred location.
  • This will create a local folder where you can work and that is linked to your online repo on GitHub.

Create a file

  • We can now create a file to go into the repo

  • This file might be your Quarto document or R code

  • At this point you have not committed anything to Git (or GitHub)

Git Status - creating a file

  • You will now be able to see detected changes to your files

Git Status - creating a file

  • You will now be able to see detected changes to your files
  1. We are on the main branch (we will cover branching next week)
  2. This is the history tab where you can see all your old commits.
  3. The tick here means that the file has been staged and is ready to be part of your next commit.

Git Status - committing

Important

The changes in your project are not stored until you tell Git that they are ready to be stored by committing them!

  • Commit its equivalent to taking snapshot 📸 of your whole project at that time. It’s still NOT on GitHub though!

  • Each version should be accompanied by a message 💬 describing the change made by the commit.

Git Status - Post Commit

Now we have committed, we will see that we no longer have any local changes

Git Status - Post Commit

Now we have committed, we will see that we no longer have any local changes

  • And our commit sits nicely in our history tab.
  • We still do not have it on GitHub though!

Git status - publishing our repo on GitHub

  • We can now publish our repo directly from GitHub Desktop

Git status - publishing our repo on GitHub

  • We can now publish our repo directly from GitHub Desktop
  • And you should now be able to see it on GitHub

Update your repo

  • lets go back our local repo and change our file

Important

You do this outside GitHub desktop, e.g., using R Studio, Note pad, Visual Code, Anaconda, etc., depending on your file.

  • Suppose you rename our file from poem.txt to poem.md

Update your repo

From a Git perspective this is equivalent to deleting and creating a new file

Update your repo

From a Git perspective this is equivalent to deleting and creating a new file

Update your repo

From a Git perspective this is equivalent to deleting and creating a new file

  • Commit this change (Note that this is only committed to Git locally, the change will not be reflected on GitHub yet)

Update your repo

Before pushing our last commit (i.e., rename poem.txt to poem.md) lets make a few more changes to our file.

  • Let’s first give our file a proper title using Markdown format
  • Basically, add the # to make a section title.

Update your repo

  • This time we see changes to the text rather than the files
  • We can then give it a commit message 💬 and commit again

Update your repo

  • We have an updated local Git but GitHub remains the same

How do we change this?🤔

Update your repo

  • We have an updated local Git but GitHub remains the same

How do we change this? PUSH! 👈

Update your repo

  • In Git terms, the local copy of our repository is 2 commits ahead of the copy on GitHub.

  • To synchronise both repositories, we need to push our local changes:

Update your repo

If we jump over to GitHub we can see our renamed poem.md file and 3 commits:

Special Conventions

  • .gitignore A list of files that we don’t want to commit to our repo

  • README Essentially the homepage of your repo. Can display download, usages, and citation information

  • LICENSE Tells people how they are allowed to use your files. People don’t always follow it though!

Your turn

Week Tasks:

  1. Download the content from a GitHub repo
  2. Clone the content from a gitHub repo
  3. Create your own repo directly on GitHub and clone to your local machine